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CLAIMS: 
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An apparatus for identifying one or more portions of 
d^ta in a database for comparison with a query input by 
a Vser, the query and the portions of data each 
comprising a sequence of sub-word units, the apparatus 
comprising: 

a Nmemory for storing data defining a plurality of 
sub-word\ unit classes, each class comprising sub-word 
units that are confusable with other sub-word units in 
the same cmss; 

a memory for storing an index having a plurality of 
entries, each\of which comprises: 

(i) an identifier for identifying the entry; 

(ii) a key Nassociated with the entry and which is 
related to the\ identifier for the entry in a 



predetermined manne 



and 
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(iii) a number of pointers which point to portions 
of data in the database which correspond to the key for 
the entry; 

wherein each key cdpiprises a sequence of sub-word 
unit classifications vftiich is derived from a 
corresponding sequence of sup-word units appearing in the 
database by classifying each &f the sub-word units in the 
sequence into one of the plurality of sub-word unit 



J 



cslasses; 

\ means for classifying each of the sub-word units in 
the\ input query into one of the plurality of sub-word 
unitVclasses and for defining one or more sub-sequences 
of quejry sub-word unit classifications; 

means for determining a corresponding identifier for 
an entry\in said index for each of said one or more sub- 
sequencesW query sub-word unit classifications; 

means \f or comparing the key associated with each of 
the determined identifiers with the corresponding sub- 
sequence of query sub-word unit classifications; and 

means f oA retrieving one or more pointers from said 
index in dependence upon the output of said comparing 
means , which one or more pointers identify said one or 
more portions of data in the database for comparison with 
the input query . \ 

2. An apparatus according to claim 1, wherein said sub- 
word units are phonemes or phoneme-like units. 

3. An apparatus according to claim 1, wherein at least 
ten sub-word unit classes are defined in advance. 

4. An apparatus according^ to claim 1, wherein each key 
is related to the corresponding identifier by a 
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?redetermined mathematical function- 
s' An apparatus according to claim 4, wherein each key 
is \related to the corresponding identifier by the 
following equation : 



\C[i]K 



Mod S 
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wheire K c is the number of sub-word unit classes, S 
is the number of entries in the index, C[i] is the number 
of the sub-Word class to which the i th sub-word unit in 
the sequence^ of sub-word units corresponding to the key 
belongs and\ W is the number of sub-word unit 
classifications in each key. 

6. An apparatiap according to claim 1, wherein said 
determining meanfe is operable to identify a new 
identifier for another entry in said index for a 
subsequence of query sub-word unit classifications if 
said comparing means \ determines that the key for the 
identifier is not the :same as the subsequence of query 
sub-word unit classifications - 
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7. An apparatus according to claim 6, wherein said 
determining means is openable to determine a new 
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identifier using the following equation: 

K IDX = [ IDX + V] Mod S 

iere (IDX) is the identifier, S is the number of 
entries\in the index and V is a predetermined number. 
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8 . An apparatus according to claim 1 , wherein the key 
for one or more of said entries is a null key indicating 
that there a\pe no pointers stored in the index for that 
entry. 
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9. An apparatus according to claim 4, wherein said 
determining means is operable to determine a 
corresponding identifier for each subsequence of query 
sub-word unit classifications using said predetermined 
mathematical function . 
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10. An apparatus according to claim 1, wherein said 
input query is a typed\ query and wherein the apparatus 
further comprises means \f or converting the typed query 
into said sequence of subVword units. 
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11. An apparatus according to claim 1, wherein said 
input query is a spoken querv and wherein the apparatus 
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further comprises a speech recognition system for 
processing the spoken query and for outputting said 
sequence of subword units. 
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12. An apparatus for searching a database in response to 
a query input by a user, the database comprising a 
plurality of sequences of sub-word units and the query 
comprising at least one sequence of sub-word units , the 
apparatus \comprising: 

an apparatus according to any of claims 1 to 11 for 
identifying >pne or more portions of data in the database 
for comparison with the input query; and 

means fori comparing the one or more sequences of 
query sub-word \units with the identified one or more 
portions of data\in said database. 
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13. An apparatus \according to claim 12 , wherein said 
means for comparing\ said input query with said portions 
of data in the database uses a dynamic programming 
comparison technique . 
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14. An apparatus according to claim 12, further 
comprising means for retrieving one or more data files in 
dependence upon the resultsy of said comparing means. 
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r§ . An apparatus for identifying one or more portions of 
da&a in a database for comparison with a query input by 
a user , the query and the portions of data each 
comprising a sequence of features, the apparatus 
comprising: 

a memory for storing data defining a plurality of 
feature cMsses r each class comprising features that are 
conf usable Vith other features in the same class; 

a memory for storing an index having a plurality of 
entries, each\of which comprises: 

(i) an identifier for identifying the entry; 

(ii) a key\ associated with the entry and which is 
related to the identifier for the entry in a 
predetermined manner; and 

(iii) a number of pointers which point to portions 
of data in the database which correspond to the key for 
the entry; 

wherein each keV comprises a sequence of feature 
classifications which \ is derived from a corresponding 
sequence of features \appearing in the database by 
classifying each of the features in the sequence into one 
of the plurality of feature classes; 

means for classif ying\ each of the features in the 
input query into one of the plurality of feature classes 
and for defining one or more sub-sequences of query 
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mature classifications ; 

means for determining a corresponding identifier for 
an \entry in said index for each of said one or more sub- 
sequences of query feature classifications; 

imeans for comparing the key associated with each of 
the determined identifiers with the corresponding sub- 
sequence of query feature classifications; and 

means for retrieving one or more pointers from said 
index in dependence upon the output of said comparing 
means, which one or more pointers identify said one or 
more portions of data in the database for comparison with 
the input query. 
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16. Data defining an index for use in searching a 
database, the data comprising: 

data defining a respective identifier for each of a 
plurality of entries in the index; 

data defiling a respective key for each of the 
plurality of fenjtries, which keys are related to the 
corresponding xaejitifiers in a predetermined manner; and 
data defining a respective one or more pointers for 
a plurality of th^ entries, which pointers point to 
locations within th^ database corresponding to the key 
for the entry; 

wherein each key \ comprises a sequence of sub-word 
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upit classifications which is derived from a 
corresponding sequence of sub-word units appearing in the 
database by classifying each of the sub-word units in the 
sequence into one of a plurality of sub-word unit 
classes, the sub-word unit classes being defined in 
advance and each comprising sub-word units that are 
conf usable with other sub-word units in the same class - 
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17. A method of identifying one or more portions of data 
in a database for comparison with a query input by a 
user, the query and the portions of data each comprising 
a sequence o^ sub-word units, the method comprising the 
steps of: 

storing dAta defining a plurality of sub-word unit 
classes, each class comprising sub-word units that are 
conf usable with other sub-word units in the same class; 

storing an index having a plurality of entries, each 
of which comprises: 

(i) an identifier for identifying the entry; 

(ii) a key associated with the entry and which is 
related to the identifier for the entry in a 
predetermined manner; \and 

(iii) a number of \ pointers which point to portions 
of data in the database! which correspond to the key for 
the entry; 
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\ wherein each key comprises a sequence of sub-word 
unit classifications which is derived from a 
corresponding sequence of sub-word units appearing in the 
database by classifying each of the sub-word units in the 
5 sequenAe into one of the plurality of sub-word unit 

classes;\ 

classifying each of the sub-word units in the input 
query into\one of the plurality of sub-word unit classes 
and for defining one or more sub-sequences of query sub- 
10 word unit classifications; 

determining a corresponding identifier for an entry 
in said index\for each of said one or more sub-sequences 
of query sub-word unit classifications; 

comparing \ the key associated with each of the 
15 determined identifiers with the corresponding sub- 

sequence of query sub-word unit classifications; and 

retrieving oiie or more pointers from said index in 
dependence upon tne output of said comparing step, which 
one or more pointers identify said one or more portions 
20 of data in the database for comparison with the input 

query. \ 

18. A method according to claim 17, wherein said sub- 
word units are phonemes\or phoneme-like units. 
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19. A method according to claim 17, wherein at least ten 
sub-word unit classes are defined in advance. 

20. \A method according to claim 17 , wherein each key is 
related to the corresponding identifier by a 
predetermined mathematical function. 
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21. A method according to claim 20, wherein each key is 
related to\the corresponding identifier by the following 
equation : 



n[c[i]*J 



tod S 
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where K c is\ the number of sub-word unit classes, S 
is the number of entries in the index, C[i] is the number 
of the sub-word ciass to which the i th sub-word unit in 
the sequence of subVword units corresponding to the key 
belongs and W is\ the number of sub-word unit 
classifications in eaqh key. 

22. A method according; to claim 17, wherein said 
determining step identifies a new identifier for another 
entry in said index for a subsequence of query sub-word 
unit classifications if said comparing step determines 
that the key for the identifier is not the same as the 
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■ { subsequence of query sub-word unit classifications. 

23\ A method according to claim 22, wherein said 
determining step determines a new identifier using the 
following equation : 



IDX \ [ IDX + V] Mod S 
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where jCDX is the identifier, S is the number of 
entries in the index and V is a predetermined number • 
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24. A method according to claim 17, wherein the key for 
one or more of sa^fi entries is a null key indicating that 
there are no pointeVs stored in the index for that entry. 

25. A method according to claim 23, wherein said 
determining step determines a corresponding identifier 
for each subsequence of query sub-word unit 
classifications using sa\j_d predetermined mathematical 
function. 
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26. A method according to claim 17, wherein said input 
query is a typed query and wherein the method further 
comprises the step of convertiri^j the typed query into 
said sequence of sub-word units, 
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2 V - A method according to claim 17, wherein said input 
qu&ry is a spoken query and wherein the method further 
comprises the step of using a speech recognition system 
to process the spoken query to generate said sequence of 
subworov units. 
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28. A me-ohod of searching a database in response to a 
query input by a user, the database comprising a 
plurality of\ sequences of sub-word units and the query 
comprising at \least one sequence of sub-word units, the 
method comprising: 

the method steps of claim 17 for identifying one or 
more portions of data in the database for comparison with 
the input query; and the step of 

comparing the pne or more sequences of query sub- 
word units with the \identif ied one or more portions of 
data in said database 
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29. A method according to claim 28, wherein said 
comparing step uses a (dynamic programming comparison 
technique to compare the i^put query with said portions 
of data. 
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30. A method according to claim 28, further comprising 
the step of retrieving one \or more data files in 
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lependence upon the results of said comparing step. 

3l\ A storage medium storing processor impiementable 
instructions for controll/nqf'a processor to implement the 
method of claim 17 or string the data of claim 16. 



32. processor implementabl/^Jinstructions for controlling 

tW^/inethod of claim 17. 



a processor to implement 
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33. An\ apparatus for identifying one or more portions of 
data in \a database for comparison with a query input by 
a user, \ the query and the portions of data each 
comprising a sequence of sub-word units, the apparatus 
being characterised by an index having a plurality of 
entries, each of which includes a key comprising a 
sequence of \sub-word unit classifications, which key is 
derived from\a corresponding sequence of sub-word units 
appearing in t\he database by classifying each of the sub- 
word units in \the sequence into one of a plurality of 
sub-word unit alasses, each class comprising sub-word 
units that are cpnfusable with other sub-word units in 
the same class. 



